Procedure of data validation

Data source * Energy Atlas: monthly consumption at census tract level from 2014 to 2016. * Building Performance Database: summary stats for large public buildings (commercial, institutional, apartments, etc.)

Census tract data

Read census tract level monthly energy data “kwh_monthly.csv” from Dropbox folder

The following is a preview of the census tract data.

geoid month usage sqft usage_med_sqft usetype year
06037113401 1 -7777 -7777 -7777 agriculture 2014
06037113401 2 -7777 -7777 -7777 agriculture 2014
06037113401 3 -7777 -7777 -7777 agriculture 2014
06037113401 4 -7777 -7777 -7777 agriculture 2014
06037113401 5 -7777 -7777 -7777 agriculture 2014
06037113401 6 -7777 -7777 -7777 agriculture 2014
##          geoid          year            month            usage         
##  06037101110:   468   2014:155844   1      : 38961   Min.   :-1243980  
##  06037101122:   468   2015:155844   2      : 38961   1st Qu.:   -9999  
##  06037101210:   468   2016:155844   3      : 38961   Median :   -7777  
##  06037101220:   468                 4      : 38961   Mean   :  222199  
##  06037101300:   468                 5      : 38961   3rd Qu.:  161675  
##  06037101400:   468                 6      : 38961   Max.   :61783414  
##  (Other)    :464724                 (Other):233766                     
##       sqft          usage_med_sqft               usetype      
##  Min.   :   -7777   Min.   :-9999.000   agriculture  : 35964  
##  1st Qu.:   -7777   1st Qu.:-8888.000   all          : 35964  
##  Median :    9813   Median :-7777.000   commercial   : 35964  
##  Mean   :  453699   Mean   :-4847.515   condo        : 35964  
##  3rd Qu.:  523986   3rd Qu.:    0.344   industrial   : 35964  
##  Max.   :25966973   Max.   :  583.563   institutional: 35964  
##                                         (Other)      :251748

Energy data in Energy Atlas has the following use types. The definition of each use type are as following according to https://energyatlas.ucla.edu/methods

usetype
agriculture
all
commercial
condo
industrial
institutional
multi_family
other
res
residential_other
residential_uncat
single_family
uncat

Census tract geometry data is downloaded from: https://catalog.data.gov/dataset/tiger-line-shapefile-2019-state-california-current-census-tract-state-based, the file “../energyAtlas/Census Tract/la-county-census-tracts.geojson” on Dropbox has a missing census tract.

## Reading layer `la-county-boundary' from data source 
##   `/Users/yujiex/Dropbox/workLBNL/EESA/code/im3-wrf/domain/la-county-boundary.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 7 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -118.9446 ymin: 32.79521 xmax: -117.6464 ymax: 34.8233
## CRS:           4326
## Reading layer `tl_2019_06_tract' from data source 
##   `/Users/yujiex/Dropbox/workLBNL/EESA/code/im3-wrf/domain/tl_2019_06_tract/tl_2019_06_tract.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 8057 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.482 ymin: 32.52883 xmax: -114.1312 ymax: 42.0095
## CRS:           4269

There are 987 census tracts with energy data.

## Spherical geometry (s2) switched off
## Reading layer `grid_with_building' from data source 
##   `/Users/yujiex/Dropbox/workLBNL/EESA/code/im3-wrf/grid_with_building.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 62 features and 3 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -118.885 ymin: 33.24275 xmax: -117.5225 ymax: 34.707
## CRS:           4326

The following compares the energy use per m2 by different usetype

The following compares the total building size of the four major times in each census tract, restricting to census tracts with Energy Atlas data.

The following compares the median consumption per m2 for different usetype. usage_med_sqft column in the Energy Atlas data set reports the median kWh per sqft usage per census tract. Before the comparison, the negative values are removed.

census tract median electricity + gas kWh/m2
usetype data.source min q1 median q3 max
commercial Energy Atlas July 2016 2.89 6.90 9.90 12.42 53.65
commercial Simulation July 2018 1.43 15.85 24.09 41.29 900.90
industrial Energy Atlas July 2016 2.19 4.11 4.51 5.71 32.83
industrial Simulation July 2018 7.16 69.62 122.74 183.50 420.94
institutional Energy Atlas July 2016 1.98 4.17 12.83 32.76 120.08
institutional Simulation July 2018 6.07 22.22 29.95 75.59 123.72
res_total Energy Atlas July 2016 1.94 3.64 4.22 6.02 2410.33
res_total Simulation July 2018 0.30 8.69 11.61 14.47 51.03
census tract median electricity + gas kBtu/ft2
usetype data.source min q1 median q3 max
commercial Energy Atlas July 2016 9.12 21.78 31.25 39.18 169.24
commercial Simulation July 2018 4.52 50.00 75.98 130.24 2841.96
industrial Energy Atlas July 2016 6.92 12.95 14.22 18.00 103.58
industrial Simulation July 2018 22.58 219.63 387.18 578.87 1327.90
institutional Energy Atlas July 2016 6.25 13.16 40.46 103.33 378.82
institutional Simulation July 2018 19.16 70.11 94.49 238.45 390.27
res_total Energy Atlas July 2016 6.12 11.47 13.32 18.99 7603.61
res_total Simulation July 2018 0.96 27.40 36.63 45.65 160.98
## `summarise()` has grouped output by 'geoid', 'year', 'month'. You can override using the `.groups` argument.
## # A tibble: 16,674 × 5
##    geoid        year month usetype    usage
##    <chr>       <dbl> <dbl> <chr>      <dbl>
##  1 06037101110  2016     1 res      859922.
##  2 06037101110  2016     2 res      659038.
##  3 06037101110  2016     3 res      693066.
##  4 06037101110  2016     4 res      652339.
##  5 06037101110  2016     5 res      699242.
##  6 06037101110  2016     6 res     1089091.
##  7 06037101110  2016     7 res     1136456.
##  8 06037101110  2016     8 res     1133279.
##  9 06037101110  2016     9 res     1062885.
## 10 06037101110  2016    10 res      759223.
## # … with 16,664 more rows

;;